We’ll compare the RCurl and httr packages
for downloading web data, read and write XML for the SOCR Main Page, and
scrape its content. First, we need to ensure the necessary packages are
installed and loaded.
library(RCurl)
web <- "https://wiki.socr.umich.edu/index.php/Main_Page"
socr_rcurl <- getURL(web, followlocation = TRUE)
str(socr_rcurl, nchar.max = 200)## chr "<!DOCTYPE html>\n<html class=\"client-nojs\" lang=\"en\" dir=\"ltr\">\n<head>\n<meta charset=\"UTF-8\"/>\n<title>SOCR</title>\n<script>document.documentElement.className = document.do"| __truncated__
library(httr)
socr_httr <- GET("https://wiki.socr.umich.edu/index.php/Main_Page")
str(socr_httr[1:3])## List of 3
## $ url : chr "https://wiki.socr.umich.edu/index.php/Main_Page"
## $ status_code: int 200
## $ headers :List of 15
## ..$ date : chr "Fri, 29 Sep 2023 11:14:07 GMT"
## ..$ server : chr "Apache"
## ..$ x-powered-by : chr "PHP/7.3.11"
## ..$ x-content-type-options: chr "nosniff"
## ..$ content-language : chr "en"
## ..$ x-ua-compatible : chr "IE=Edge"
## ..$ link : chr "</local/images/SOCR_3D_Logo_UM.png?55fd6>;rel=preload;as=image"
## ..$ vary : chr "Accept-Encoding,Cookie"
## ..$ expires : chr "Thu, 01 Jan 1970 00:00:00 GMT"
## ..$ cache-control : chr "private, must-revalidate, max-age=0"
## ..$ last-modified : chr "Wed, 03 May 2023 17:49:16 GMT"
## ..$ content-encoding : chr "gzip"
## ..$ content-length : chr "6824"
## ..$ content-type : chr "text/html; charset=UTF-8"
## ..$ set-cookie : chr "LBSESSIONID=1392759693.47873.0000; path=/; Httponly; Secure; SameSite=none"
## ..- attr(*, "class")= chr [1:2] "insensitive" "list"
Both methods provide similar results, but httr offers a
more user-friendly interface.
library(XML)
web.parsed <- htmlParse(socr_rcurl, asText = TRUE, encoding="UTF-8")
plain.text <- xpathSApply(web.parsed, "//p", xmlValue)
cleaned.text <- paste(plain.text, collapse = "\n")
cleaned.text <- gsub("\n+", "\n", cleaned.text) # Remove multiple newlines
cleaned.text <- gsub("^\\s+|\\s+$", "", cleaned.text) # Remove leading/trailing white spaces
cleaned.text <- gsub("(?<!Translate this page:)\n", " ", cleaned.text, perl = TRUE)
cleaned.text <- gsub("Translate this page:\n", "Translate this page: ", cleaned.text) # Remove newline after 'Translate this page:'
substr(cleaned.text, start=1, stop=256)## [1] "It's online, therefore it exists! Translate this page: (default) Deutsch Español Français Italiano Português 日本語 България الامارات العربية المتحدة Suomi इस भाषा में Norge 한국어 中文 繁体中文 Русский Nederlands Ελληνικά Hrvatska Česká republika Danm"
Let’s extract data from the SOCR Main Page and perform
scraping.
## {html_document}
## <html class="client-nojs" lang="en" dir="ltr">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject ...
## [1] "SOCR Data - SOCR"
## [[1]]
## http-equiv content
## "Content-Type" "text/html; charset=UTF-8"
##
## [[2]]
## charset
## "UTF-8"
##
## [[3]]
## name content
## "ResourceLoaderDynamicStyles" ""
##
## [[4]]
## name content
## "generator" "MediaWiki 1.31.6"
We will be analyzing the network graph of characters from Victor Hugo’s novel “Les Misérables”. First, we’ll load the necessary libraries and the data.
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
# Load the graph data
df <- read.table("I:/UBD PB/ZH5103 Predictive/Specialized/03_les miserablese_GraphData.txt", header = FALSE)
g <- graph_from_data_frame(df, directed = FALSE)## IGRAPH 4fdcfe6 UN-- 77 254 --
## + attr: name (v/c)
The graph g is an undirected network representing connections between different entities. It consists of 77 nodes (or vertices) and 254 edges (or connections). Each vertex in this graph has an associated attribute named “name”, which is of character type. This attribute likely represents the label or identifier for each vertex. The graph provides a structured representation of the relationships between these entities, where the edges denote some form of association or interaction between pairs of nodes. Given the context of “Les Misérables”, it’s plausible that the nodes represent characters from the novel, and the edges signify some form of relationship or interaction between them.
## Valjean Gavroche Marius Javert
## 36 22 19 17
## Thenardier Fantine Enjolras Courfeyrac
## 16 15 15 13
## Bossuet Bahorel Joly MmeThenardier
## 13 12 12 11
## Cosette Eponine Mabeuf Combeferre
## 11 11 11 11
## Feuilly Myriel Gueulemer Babet
## 11 10 10 10
## Claquesous Grantaire Tholomyes Prouvaire
## 10 10 9 9
## Montparnasse Bamatabois Listolier Fameuil
## 9 8 7 7
## Blacheville Favourite Dahlia Zephine
## 7 7 7 7
## Gillenormand MlleGillenormand Brujon MmeHucheloup
## 7 7 7 7
## Judge Champmathieu Brevet Chenildieu
## 6 6 6 6
## Cochepaille Fauchelevent LtGillenormand Simplice
## 6 4 4 4
## MlleBaptistine MmeMagloire Pontmercy Anzelma
## 3 3 3 3
## Woman2 Toussaint Marguerite Perpetue
## 3 3 2 2
## MmeBurgon Child1 Woman1 MotherInnocent
## 2 2 2 2
## Magnon MmePontmercy BaronessT Child2
## 2 2 2 2
## Labarre Jondrette Napoleon CountessDeLo
## 1 1 1 1
## Geborand Champtercier Cravatte Count
## 1 1 1 1
## OldMan MmeDeR Isabeau Gervais
## 1 1 1 1
## Scaufflaire Boulatruelle Gribier MlleVaubois
## 1 1 1 1
## MotherPlutarch
## 1
The degree of a node in an undirected graph is the number of connections it has.
The graph represents the interactions between characters in “Les Misérables”. The most connected character is Aljean with 36 connections, followed by Gavroche with 22, and Marius with 19. On the other end of the spectrum, several characters, such as Labarre, Jondrette, Napoleon, and CountessDeLo, among others, have only a single connection. Most characters have multiple interactions, with the majority having more than 5 connections.
## Valjean Myriel Gavroche Marius
## 1624.4688004 504.0000000 470.5706319 376.2925926
## Fantine Thenardier Javert MlleGillenormand
## 369.4869418 213.4684805 154.8449450 135.6569444
## Enjolras Tholomyes Bossuet MmeThenardier
## 121.2770669 115.7936423 87.6479030 82.6568934
## Mabeuf Fauchelevent MmeBurgon Cosette
## 78.8345238 75.5000000 75.0000000 67.8193223
## Gillenormand Eponine Simplice Bamatabois
## 57.6002715 32.7395194 24.6248408 22.9166667
## Pontmercy Courfeyrac Gueulemer Babet
## 19.7375000 15.0110352 14.1370943 14.1370943
## Claquesous Montparnasse Bahorel Joly
## 13.8561420 11.0404151 6.2286417 6.2286417
## Combeferre Feuilly Brujon MmePontmercy
## 3.5629149 3.5629149 1.2500000 1.0000000
## Magnon Grantaire MlleBaptistine Labarre
## 0.6190476 0.4285714 0.0000000 0.0000000
## MmeMagloire Listolier Fameuil Blacheville
## 0.0000000 0.0000000 0.0000000 0.0000000
## Favourite Dahlia Zephine Marguerite
## 0.0000000 0.0000000 0.0000000 0.0000000
## Perpetue Judge Champmathieu Brevet
## 0.0000000 0.0000000 0.0000000 0.0000000
## Chenildieu Jondrette LtGillenormand Prouvaire
## 0.0000000 0.0000000 0.0000000 0.0000000
## Child1 Napoleon CountessDeLo Geborand
## 0.0000000 0.0000000 0.0000000 0.0000000
## Champtercier Cravatte Count OldMan
## 0.0000000 0.0000000 0.0000000 0.0000000
## MmeDeR Isabeau Gervais Scaufflaire
## 0.0000000 0.0000000 0.0000000 0.0000000
## Woman1 Cochepaille Boulatruelle Anzelma
## 0.0000000 0.0000000 0.0000000 0.0000000
## Woman2 MotherInnocent Gribier MlleVaubois
## 0.0000000 0.0000000 0.0000000 0.0000000
## BaronessT MotherPlutarch Toussaint Child2
## 0.0000000 0.0000000 0.0000000 0.0000000
## MmeHucheloup
## 0.0000000
Centrality measures how close a node is to all other nodes in the network.
In the character network of “Les Misérables”, Valjean emerges as the most central figure with a betweenness centrality score of 1624.47, signifying his role as a key bridge or intermediary within the network. The character Myriel is next with a score of 504, followed closely by Gavroche with a score of 470.57. This suggests that these characters are crucial connectors in the story, linking different clusters or groups of characters. Conversely, many characters, such as Labarre, MmeMagloire, and Listolier, among others, have a centrality score of zero, indicating that they don’t act as bridges or intermediaries in the network.
Nodes with higher degrees and centrality scores are generally more important in the network.
## Valjean Gavroche Marius Javert Thenardier Fantine Enjolras
## 36 22 19 17 16 15 15
## Courfeyrac Bossuet Bahorel
## 13 13 12
## Valjean Myriel Gavroche Marius
## 1624.4688 504.0000 470.5706 376.2926
## Fantine Thenardier Javert MlleGillenormand
## 369.4869 213.4685 154.8449 135.6569
## Enjolras Tholomyes
## 121.2771 115.7936
The output will display the top 10 nodes (characters) with the highest degree and centrality, indicating their importance in the network.
If we were to assume the graph is directed, the relationships would be one-way, meaning one character interacts with another, but it may not be reciprocated. This would result in the calculation of in-degree (incoming connections) and out-degree (outgoing connections) for each node. The interpretations of centrality measures would also be different in the context of directed relationships.
We’ll be analyzing the data on adults with heart attacks, performing data processing, model training, and parallel computing. Install and load the necessary packages.
#install.packages(c("openxlsx", "rio", "data.table","ff","ffbase", "dplyr", "randomForest", "caret", "parallel", "doParallel", "foreach"))library(openxlsx)
data <- read.xlsx("I:/UBD PB/ZH5103 Predictive/Specialized/CaseStudy12_ AdultsHeartAttack_Data (1).xlsx")
head(data)## Patient DIAGNOSIS SEX DRG DIED CHARGES LOS AGE
## 1 1 41041 F 122 0 4752 10 79
## 2 2 41041 F 122 0 3941 6 34
## 3 3 41091 F 122 0 3657 5 76
## 4 4 41081 F 122 0 1481 2 80
## 5 5 41091 M 122 0 1681 1 55
## 6 6 41091 M 121 0 6379 9 84
## Patient DIAGNOSIS SEX DRG DIED CHARGES LOS AGE
## 1: 1 41041 F 122 0 4752 10 79
## 2: 2 41041 F 122 0 3941 6 34
## 3: 3 41091 F 122 0 3657 5 76
## 4: 4 41081 F 122 0 1481 2 80
## 5: 5 41091 M 122 0 1681 1 55
## 6: 6 41091 M 121 0 6379 9 84
## Loading required package: bit
##
## Attaching package: 'bit'
## The following object is masked from 'package:data.table':
##
## setattr
## The following object is masked from 'package:RCurl':
##
## clone
## The following object is masked from 'package:base':
##
## xor
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/Jun/AppData/Local/Temp/Rtmpo9X3KW/ff"
## - getOption("ffextension")=="ff"
## - getOption("ffdrop")==TRUE
## - getOption("fffinonexit")==TRUE
## - getOption("ffpagesize")==65536
## - getOption("ffcaching")=="mmnoflush" -- consider "ffeachflush" if your system stalls on large writes
## - getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system
## - getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system
##
## Attaching package: 'ff'
## The following objects are masked from 'package:utils':
##
## write.csv, write.csv2
## The following objects are masked from 'package:base':
##
## is.factor, is.ordered
## [1] NA
We cannot apply functions directly on this ff package.
## Registered S3 methods overwritten by 'ffbase':
## method from
## [.ff ff
## [.ffdf ff
## [<-.ff ff
## [<-.ffdf ff
##
## Attaching package: 'ffbase'
## The following objects are masked from 'package:base':
##
## %in%, table
## 'data.frame': 150 obs. of 5 variables:
## $ DRG : int 122 122 122 122 122 121 121 121 121 123 ...
## $ DIED : int 0 0 0 0 0 0 0 0 0 1 ...
## $ CHARGES: Factor w/ 146 levels ".","10003","10020",..: 86 76 69 25 33 99 11 32 78 38 ...
## $ LOS : int 10 6 5 2 1 9 15 15 2 1 ...
## $ AGE : int 79 34 76 80 55 84 84 70 76 65 ...
## [1] 121.7533
## [1] 0.1066667
##
## . 10003 10020 10089 10173 1035 10530 1070 10750 1095 10959 11080 11163
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 11969 1205 1229 12403 12494 1251 1354 1383 1409 1416 1432 1481 1505
## 1 1 1 1 1 1 1 1 1 1 1 2 1
## 1510 1582 1595 16429 1655 16584 1681 1684 17137 1805 1974 1989 2173
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2227 2267 2274 2308 2423 2441 2521 2556 2584 2646 2671 2750 2886
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 2905 3033 3036 3090 3096 3140 3148 3173 3215 3235 3315 3318 3386
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3400 3440 3512 3657 3694 3739 3790 3882 3918 3931 3941 3988 4015
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4068 4101 4174 4317 4392 4434 4662 4752 4864 4896 4974 4990 4994
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5001 5461 5861 6065 6145 6298 6316 6379 6491 6539 6736 6912 6977
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 701 7039 7049 7160 7165 7182 7214 7223 7240 7307 733 7330 7429
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 7454 7472 7497 7518 7589 7676 7856 7901 7916 8066 815 8227 8260
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 8274 8358 8596 8638 8646 8732 8817 8830 8858 9091 9117 9155 9158
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 9279 9517 9863
## 1 1 1
## [1] 4.166667
## [1] 66.32667
Using the ffbase package, we can perform efficient data manipulations on large datasets. For columns with integer data, we can calculate the mean, while for factor columns, we can compute the frequency.
First, let’s create a balanced dataset:
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:bit':
##
## symdiff
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:igraph':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
set.seed(123)
died_data <- data[data$DIED == 1, ]
alive_data <- data[data$DIED == 0, ] %>% sample_n(size = nrow(died_data))
balanced_data <- rbind(died_data, alive_data)Now, train the random forest model:
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
set.seed(123)
rf_model <- randomForest(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, ntree = 20000)
rf_model##
## Call:
## randomForest(formula = DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, ntree = 20000)
## Type of random forest: regression
## Number of trees: 20000
## No. of variables tried at each split: 2
##
## Mean of squared residuals: 0.02157022
## % Var explained: 91.37
First, we set up training control:
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:httr':
##
## progress
set.seed(123)
train_control <- trainControl(method = "cv", number = 5)
system.time({
model_caret <- train(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, method = "rf", trControl = train_control, tuneLength = 5)
})## user system elapsed
## 0.67 0.04 0.71
In R, system.time() measures code execution with “user” time (active CPU duration), “system” time (CPU time for system operations), and “elapsed” time (total real-world duration).
In general, when you’re trying to assess how long your code takes to run from a user’s perspective, the elapsed time is the most relevant. But the breakdown into user and system can give insights into where the time is spent (in user-level computation vs. system-level calls).
Detect cores:
Train the model in parallel:
set.seed(123)
train_control_parallel <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
system.time({
model_caret_parallel <- train(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, method = "rf", trControl = train_control_parallel, tuneLength = 5)
})## user system elapsed
## 0.66 0.03 0.68
In the computational tasks, both the standard caret method and the parallelized training were closely matched in terms of elapsed time. The parallelized training was marginally quicker. While the user time for the parallelized approach was slightly reduced compared to the caret method, the system time was somewhat shorter in the caret method. Overall, the differences in execution times between the two methods were minimal, with the parallelized training offering a slight edge in efficiency.
## Loading required package: foreach
## Loading required package: iterators
registerDoParallel(makeCluster(detectCores()))
library(foreach)
system.time({
rf_model_parallel <- foreach(ntree = rep(20000), .combine = combine, .packages = "randomForest") %dopar% {
randomForest(DIED ~ DIAGNOSIS + SEX + DRG + CHARGES + LOS + AGE, data = balanced_data, ntree = ntree)
}
})## user system elapsed
## 0.02 0.04 0.62
In the computational comparison, the standard caret method and the parallelized training had very close elapsed times, with the parallelized approach being slightly faster. Standard caret method doesn’t utilize parallel processing and use single-core training method.Parallelized Training distributes the training of a single random forest model (with potential large ntree) across multiple cores during cross-validation will likely yield significant speedup, especially if the dataset is large. The most noticeable improvement in efficiency came when using foreach and doParallel for parallelized random forest training. This method was significantly more time-efficient compared to the other two approaches.It divides the total number of trees (20,000 in this case) across the available cores. Each core builds its random forest, and then all the forests are combined. This approach minimizes overhead and maximizes parallel efficiency.
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:ff':
##
## add
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following object is masked from 'package:randomForest':
##
## grow
## The following object is masked from 'package:dplyr':
##
## where
## The following object is masked from 'package:ff':
##
## add
## The following object is masked from 'package:igraph':
##
## spectrum
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
Now, let’s create a Gaussian kernel and use the C++ function to convolve the image. We will use the Rcpp package to write and execute C++ code.
library(Rcpp)
cppFunction('
NumericMatrix gaussianSmooth(NumericMatrix img) {
int rows = img.nrow();
int cols = img.ncol();
NumericMatrix result(rows, cols);
// Define a 2D Gaussian kernel
NumericMatrix kernel(3,3);
kernel(0,0) = 1/16.0; kernel(0,1) = 2/16.0; kernel(0,2) = 1/16.0;
kernel(1,0) = 2/16.0; kernel(1,1) = 4/16.0; kernel(1,2) = 2/16.0;
kernel(2,0) = 1/16.0; kernel(2,1) = 2/16.0; kernel(2,2) = 1/16.0;
// Convolve the image with the kernel
for(int i = 1; i < (rows - 1); i++) {
for(int j = 1; j < (cols - 1); j++) {
double sum = 0;
for(int ki = -1; ki <= 1; ki++) {
for(int kj = -1; kj <= 1; kj++) {
sum += img(i + ki, j + kj) * kernel(ki + 1, kj + 1);
}
}
result(i, j) = sum;
}
}
return result;
}
')
img_gray <- grayscale(img)
img_matrix <- as.matrix(img_gray)
smoothed_img <- gaussianSmooth(img_matrix)##
## Attaching package: 'plotly'
## The following object is masked from 'package:imager':
##
## highlight
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:rio':
##
## export
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:httr':
##
## config
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# Original Image
p1 <- plot_ly(
x = 1:ncol(img_matrix),
y = 1:nrow(img_matrix),
z = img_matrix,
type = "heatmap",
colorscale = "Greys",
zmin = 0,
zmax = 1
)
# Smoothed Image
p2 <- plot_ly(
x = 1:ncol(smoothed_img),
y = 1:nrow(smoothed_img),
z = smoothed_img,
type = "heatmap",
colorscale = "Greys",
zmin = 0,
zmax = 1
)
combined_plot <- subplot(p1, p2)
# Add titles
combined_plot %>% layout(
annotations = list(
list(
text = "Original Image",
xref = "paper",
yref = "paper",
x = 0.15, # Relative position within the plot area
y = 1.05, # Relative position above the plot area
showarrow = F,
font = list(size = 14)
),
list(
text = "Smoothed Image",
xref = "paper",
yref = "paper",
x = 0.85, # Relative position within the plot area
y = 1.05, # Relative position above the plot area
showarrow = F,
font = list(size = 14)
)
)
)import numpy as np
#Retrieve both Original and Smoothed Images in Python
img_matrix_python = np.array(r.img_matrix)
smoothed_img_python = np.array(r.smoothed_img)
# Compute the difference between the original and smoothed images
difference_img = img_matrix_python - smoothed_img_python
difference_img## array([[0.00392157, 0.00392157, 0.00392157, ..., 0.00392157, 0.00392157,
## 0.00392157],
## [0.00392157, 0. , 0. , ..., 0. , 0. ,
## 0.00392157],
## [0.00392157, 0. , 0. , ..., 0. , 0. ,
## 0.00392157],
## ...,
## [0.00392157, 0. , 0. , ..., 0. , 0. ,
## 0.00392157],
## [0.00392157, 0. , 0. , ..., 0. , 0. ,
## 0.00392157],
## [0.00392157, 0.00392157, 0.00392157, ..., 0.00392157, 0.00392157,
## 0.00392157]])
The Python code computes the difference between an original and a smoothed image to highlight the areas most impacted by the smoothing process. The resulting difference_img captures these alterations: values close to 0 indicate little to no change, while non-zero values point to more significant modifications. In this specific output, values of 0.00392157 suggest minor changes due to smoothing, whereas values of 0 indicate unchanged regions. This difference image thus offers insights into the regions most influenced by the smoothing operation.
##
## Attaching package: 'reticulate'
## The following object is masked from 'package:rio':
##
## import
diff_img <- py$difference_img
# Display all three images using plot_ly in 2D
p1 <- plot_ly(z = as.matrix(img_gray), type = "heatmap")
p2 <- plot_ly(z = smoothed_img, type = "heatmap")
p3 <- plot_ly(z = diff_img, type = "heatmap")
combined_plot<-subplot(p1, p2, p3)
# Add titles
combined_plot %>% layout(
annotations = list(
list(
text = "Original Image",
xref = "paper",
yref = "paper",
x = 0.1, # Relative position within the plot area
y = 1.05, # Relative position above the plot area
showarrow = F,
font = list(size = 14)
),
list(
text = "Smoothed Image",
xref = "paper",
yref = "paper",
x = 0.5, # Relative position within the plot area
y = 1.05, # Relative position above the plot area
showarrow = F,
font = list(size = 14)
),list(
text = "Difference Image",
xref = "paper",
yref = "paper",
x = 0.95, # Relative position within the plot area
y = 1.05, # Relative position above the plot area
showarrow = F,
font = list(size = 14)
)
)
)The reticulate package in R facilitates seamless integration between R and Python. By installing and loading reticulate, users can execute Python code directly within an R environment, convert R data structures to Python, and retrieve Python objects back into R. This interplay allows for the harnessing of capabilities from both languages in a single workflow, bridging the gap between two of the most popular data science languages.
The R code visualizes the original, smoothed, and difference images side by side using the plot_ly function. It highlights the raw data in the original image, the effect of Gaussian smoothing in the smoothed image, and the specific changes brought about by the smoothing in the difference image. This side-by-side comparison allows for an immediate understanding of the impact of the smoothing process on the image.
p1_3d <- plot_ly(z = as.matrix(img_gray), type = "surface") %>% layout(title = "Original Image 3D")
p2_3d <- plot_ly(z = smoothed_img, type = "surface") %>% layout(title = "Smoothed Image 3D")
p3_3d <- plot_ly(z = diff_img, type = "surface") %>% layout(title = "Difference Image 3D")
p1_3dThe R code visualizes the original, smoothed, and difference images in 3D. The “Original Image 3D” displays the MRI’s inherent intensity variations. The “Smoothed Image 3D” shows the effect of Gaussian smoothing, softening the image’s features. The “Difference Image 3D” highlights where the smoothing had the most influence. This 3D perspective underscores the changes brought by the smoothing process, aiding in understanding its impact on the image.